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Summary 

We  apply  the  Box  and  Cox  (1964)  power  transformation  family  and  robust 
alternatives  developed  by  Bickel  and  Doksum  (1981)  and  Carroll  (1980) 
to  data  sets  given  by  John  (1978).  The  robust  methods  perform  quite  well 
compared  to  normal  theory  likelihood  methods. 
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Introduction 

Our  basic  framework  for  transformation  is  the  power  family  (Box  and  Cox 
(1964));  for  some  unknown  A, 

(A) 

Yj  =  +  OEi  ,i  =  1 . . 

where 

(A)  A 

r  '  =  (Y  -  1 )/A  (A  f  0) 

=  log  Y  (A  =  0) . 


Here  {x ^ }  are  (1  x  p)  design  vectors,  B  is  a  (p  x  1)  regression  parameter,  a 
is  a  scaling  constant,  and  (e ^ }  are  independently  and  identically  distributed 
with  mean  zero  and  distribution  F.  Of  course,  we  want  F  to  be  the  standard 
normal  distribution  function  4,  but  in  general  normality,  linearity  and 
heteroscedasticity  may  not  be  simultaneously  attainable  so  we  think  of  F  as 
symmetric  and  almost  normal. 

Box  and  Cox  (1964),  Andrews  (1971),  Atkinson  (1973),  Bickel  (unpublished) 
and  Carroll  (1980)  have  considered  the  problem  of  testing  whether  a  given 
value  A0  results  in  the  model  (1.1),  i.e.,  they  test 


Ho •  *  =  V  (1.2) 

Box  and  Cox  proposed  a  likelihood  ratio  test,  while  Atkinson  proposed  a 
computationally  simpler  variant;  both  have  good  power  properties  when  F  =  4, 
but  Carroll  (1980)  shows  they  are  sensitive  to  outliers  and  have  highly  inflated 
test  levels  (Type  I  errors)  when  F  f  4.  The  tests  proposed  by  Andrews  and 
Bickel  hold  the  correct  test  levels  when  F  f  4  but  are  not  very  powerful  when 
F  ■  4. 

Because  the  normal  theory  likelihood  estimates  are  very  sensitive  to 


outliers,  Bickel  and  Doksum  (1981)  and  Carroll  (1980)  introduced  robust  methods. 
Let  o  be  a  (usually)  convex  function,  i);  *  p'  be  odd  and  y  be  an  even  function. 
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For  a  given  x  define  b(x)  and  o(x)  as  the  solutions  to 

r%(r1(x))xi  =  0 

rjxtr^x))  =  0 

r^(x)  =  (Y^  -  x16)/o. 

One  then  minimizes  the  function 

*(x)  «  Nlogo(x)  +  zJd((y}^  -  xie(x))/0(x))  -  (x  -  1 ) zl ogY . . 


(1.3) 


(1.4) 


(1.5) 


2 

When  p (x )  =  x  /2  =  x(x),  we  obtain  the  maximum  likelihood  estimates  of  the 
parameters  (x,B,o)  when  F  =  4>.  In  general,  (1.5)  is  the  likelihood  when  F  has 
density  proportional  to  exp(-p(x)),  and  (1.3)  -  (1.4)  lead  to  Huber's  Proposal  2 

(1973)  for  robust  regression.  Bickel  and  Doksum  obtain  the  limiting  distri- 

★  ★  ★ 

butions  for  the  estimates  (x,S,~  ),  showing  that  they  have  better  robustness 
properties  than  the  normal  theory  MIE.  Other  recent  references  are  Carroll 
(1981b),  (1981c),  Carroll  and  Rupnert  (1981),  Doksum  and  Wong  (1981)  and 
Hernandez  and  Johnson  (1981). 

In  this  paper  we  apply  the  robust  methods  to  the  two  data  sets  given  by 

John  (1978).  John  (1978)  and  Carroll  (1981a)  originally  studied  these  data  sets 

because  both  exhibit  possible  outliers;  Carroll's  (1981a)  reanalysis  is  based  on 

robust  methods  without  transformation.  In  both  data  sets  the  responses  are 

positive  so  that  the  simple  model  (1.1)  is  easy  to  apply.  We  focus  primarily 

on  estimating  x  and  testing  whether  it  is  a  specified  value,  i.e.,  we  test  (1.2) 

for  various  x  . 

o 

Carroll  (1980)  proposed  testing  (1.2)  by  treating  the  function  *(>. )  In  (1.5) 
as  if  it  were  a  likelihood,  rejecting  Hq:  x  *  Xq  if 
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L  =  2{*(ao)  -  i(A*))  >  ca 

where  ca  Is  the  appropriate  chi-square  percentage  point. 


(1.6) 

For  the  choice 


*(x)  =  -*(-x) 

=  x  o^x^k 

=  k  x>k 


x(v) 


<l|2(v) 


ij2(x)(2-)''2exp(-x2/2)dx, 


(1.7) 


he  found  that  such  a  test  was  somewhat  of  a  compromise  among  those  previously 
proposed;  it  has  good  power  properties  even  when  F  +  but  its  level  varies 
and  can  be  higher  than  desired,  although  it  has  an  approximately  correct  level 
at  the  normal  distribution  and  the  problem  of  the  level  is  not  as  severe  f.s  that 
for  the  normal  theory  likelihood  ratio  test. 

One  can  study  the  general  test  statistic  (1.6)  by  using  the  asymptotic 
theory  of  Bickel  and  Doksum  (1981),  who  achieve  major  simplifications  by  letting 
o  -*  o  and  N  ♦  «■  simultaneously.  It  turns  Out  that  one  can  prove  the  following 


Result  Define  *(y)  =  y.  (y)  -  1,  r.  =  r^-*) 


and 


E-.'  =  N'^'tr.) 


N"1ljv2(r.) 


Then  as  N  -  <*■  and  c  -*•  o,  under  the  hypothesis  H0: 


,,  the  statistic 


S'  /s 

L*  =  ( E  * ' )  ( E  v,2 )  ~ 1 L 


d.e; 


has  a  chi-square  distribution  with  one  degree  of  freedom. 
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Details  are  given  In  the  appendix.  The  statistic  (1.8)  is  similar  to  one  given 
by  Schrader  and  Hettmansperger  (1980).  The  choice 

x(y)  a  y^(y)  -  l  (1.9) 

Is  suggested  by  Bickel  and  Doksum,  and  we  will  use  it  throughout  this  paper. 

The  result  is  of  limited  practical  interest  (see  the  example  in  the  appendix), 
but  at  least  it  suggests  a  plausible  choice  for  x • 

2.  Applications 

In  this  section  we  apply  the  methods  we  have  discussed  to  two  data  sets 
introduced  by  John  (1978).  Following  Bickel  and  Doksum  (1981),  we  set 
x(x)  =  xil'(x)  -  1  and  we  use  the  following  three  choices  of  4: 

(MLE)  v(x )  =  x 

("Huber")  v(x)  as  in  (1.7),  k  =  2.0 

("Hampel ")  v (x )  =  -  ; (-x ) 

=  x  o^x^a  =2.0 

=  a  a<xsb  =  3.5 

=  a(c-x)/(c-b)  b<x^c  =  5.C 

=  0  x>c 

We  include  the  "Hampel"  because  the  data  sets  have  potential  outliers  and,  as  in 

* 

Carroll  (1980),  the  influence  function  of  A  is  not  bounded  if  ip  is  monotone, 

A  wond  of  caution  about  "Hampel"  is  in  order.  Because  4  is  not  monotone, 

convergence  difficulties  may  arise.  Hence  in  maximizing  the  function  (1.5)  with 
c'  =  4s  we  find  the  values  of  S>(\)  and  c(>.)  by  first  solving  for  the  "Huber"  and 
then  doing  two  iterations  of  the  weighted  least  squares  algorithm  with  the 
"Hampel"  \p.  In  all  examples,  the  function  £(A)  attained  a  unique  minimum  on  the 
interval  [ >Js  2.0. 


7 


The  first  data  set  is  particularly  interesting.  In  the  original  scale  of 
the  data  {X  =  1),  both  John  (1978)  and  Carroll  (1981a)  conclude  that  the  data 
point  with  respouse  Y  *  14  is  an  extreme  outlier,  but  except  for  this  point  the 
normal  linear  model  fits  well.  An  acceptable  analysis  would  thus  estimate  X  as 
somewhere  near  1.  As  predicted  by  the  influence  function  calculations  in  Carroll 
(1980),  the  "MLE"  estimate  for  X  is  much  more  sensitive  to  the  outlier  than  the 
"Huber",  which  in  turn  is  more  sensitive  than  the  "Hampel";  see  Table  2  for 
details. 

When  we  treat  observation  #11  with  respouse  Y  =  14  as  an  outlier  and 
replace  it  by  John's  suggested  Y  =  62.33,  we  obtain  the  results  given  in  Table 
#2.  All  three  methods  give  essentially  the  same  answer  now,  and  it  seems  reason¬ 
able  to  accept  HQ:  X  =  1.0  and  to  conclude  that  no  transformation  is  really 
necessary.  From  a  mechanical  viewpoint,  a  combination  of  transformation  and 
fitting  using  the  "Hampel"  ^  seems  to  give  the  best  overall  analysis.  However, 
the  best  pratice  would  be  to  use  all  three  methods  for  the  most  revealing  analysis. 

In  Table  3  we  present  estimates  of  X  and  the  test  statistic  for  H0 :  X  *  1.0 
obtained  by  varying  observation  #11.  It  is  interesting  to  note  that  'Hampel 
is  not  insensitive  to  the  changing  observation,  although  we  can  always  conclude 
that  no  transformation  is  really  necessary. 

For  the  second  data  set,  all  three  methods  indicate  that  logarithms  would 
be  an  acceptable  transformation  (see  Carroll  (1981b)  for  a  discussion  of  the 
value  of  moving  the  MLE  of  X  to  an  easily  Interpretable  value). 

These  examples,  the  empirical  work  in  Carroll  (1981a)  and  substantial 
theoretical  work  as  in  Huber  (1977)  all  point  to  the  desirability  of  using  robust 
methods  in  transforming  and  analyzing  data,  along,  of  course,  with  other  standard 


tools. 
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Table  1 

The  first  data  set  described  by  John  (1978).  The  estimation  methods  are  as 
described  in  (1.3)  -  (1.5),  while  the  test  statistic  L*  is  given  by 
(1.8)  -  (1.9).  These  are  the  original  data. 


"MLE" 

"Huber " 

"Harrpe  l 

X* 

1.91 

1.66 

1.02 

L* (1.0) 

3.3 

2.1 

0.0 

L*(0. 5) 

8.2 

6.7 

1.1 

L*(0.0) 

15.1 

13.9 

4.4 

L*(-.5) 

24.2 

23.6 

9.7 

ft 


L*(-1.0) 


35.1 


35.8 


16.7 


Table  2 


This  is  the  first  data  set  described  by  John,  except  that  observation  #11 
(Y  =  14)  has  been  modified  to  Y  =  62.33.  See  Table  #1  for  more  details. 


Table  3 


i  u 


Various  values  of  X*  and  L*(1.0)  for  John's  (1978)  first  data  set  when 

observation  till  is  varied. 


Observation 

Ml 

MLE 

MLE 

L* (1.0) 

"Huber" 

"Huber " 

L*  (1.0) 

"Hampel" 

"Hampel" 

L*(1.0) 

14.00 

1.91 

3.34 

1.66 

2.10 

1.02 

.00 

18.83 

1.77 

1.98 

1.55 

1.28 

.70 

1.73 

23.67 

1.58 

1.02 

1.43 

.72 

1.20 

.20 

33.31 

1.30 

.30 

1.28 

.28 

1.28 

.28 

43.00 

1.30 

.35 

1.32 

.38 

1.31 

.38 

52.67 

1.41 

.71 

1.43 

.76 

1.43 

.76 

62.33 

1.31 

.48 

1.30 

.50 

1.30 

.50 

I 
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Table  4 

The  second,  data  set  given  by  John  (1978).  See  Table  Ml  for  conventions. 


"MLE" 

"Huber " 

"Harrpe  l " 

X* 

.11 

.15 

.15 

L* (1.0) 

21.45 

19.69 

15.06 

L* (0.5) 

4.32 

3.37 

2.25 

L*  (0.0) 

.36 

.53 

.53 

L* (-0.5) 

9.25 

9.08 

9.08 

L* (-1.0) 

25.95 

25.51 

25.49 
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Appendix 

Example:  Consider  regression  through  the  origen  with  X  =  0,  a  -  1: 


logY-j  =  b  x.  +  c  . 

_N  J  3 

Zlxi  ‘  Zlxi  '  0 


Z^x^  =  N  Z^x^  =  u 

Vi  Vi  V 


If  one  tests  ^ :  X  ■  0  by  the  likelihood  ratio  test  (which  is  just  (1.8) 
with  p(x)  =  x2/2,  x(y)  =  y2),  standard  likelihood  methods  show  that  when  H0 
is  true. 


L  =  L*  -  Zd, 

where  Z  has  a  chi-square  distribution  with  one  degree  of  freedom  and 

d  =  (Ee^  -  4EcJ  +  4  +  B2(6 EeJ  -  8)  +  bV4) 
x  ( 7 Ec j / 3  +  10B2  +  B4u4)'1. 

The  constant  d  =  1  when  F  =  0  and  one  can  actually  transform  to  a  normal 
distribution,  but  in  general  d  f  1  so  that  the  test  L*  does  not  always  have 
the  correct  asymptotic  level. 

Bickel  and  Doksum  (1981)  study  the  asymptotic  behavior  of  (X*,B*,o*)  by 
letting  a  ■*  o  at  a  known  rate  as  N  -*•  <*>  .  Define 
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B22  =  (E^')N-1Z^xJxi 
B12  =  ’  ■  B21 


8 
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=  0 


Bn  =  -  (Er)N'1^ 


B33  =  EelX(cl) 

Without  stating  the  precise  details,  it  suffices  to  state  that  they  show  that 
as  N  -*■  00  ,  o  -*■  o, 

NH((A*,j3*,a*)  -  (A,B,a))/o  (Al) 

*  n-^J‘b'1w1  +  0(1), 

where 

B  =  (Bu) 
and 

W.  =  (qiip(ei),  x'^(e.),  x(c.)). 

Bickel  and  Doksum,  Carroll  and  Ruppert  (1980)  and  Carroll  (1981b, c)  discuss 
the  interesting  point  outside  the  scope  of  this  paper  that  (Al)  means  that  6* 
is  asymptotically  normally  distributed  with  mean  zero  and  covariance 
(E*2)S/(E<n2N), 


S  *  N'1zjxjxi  +  Q,  (A2) 

and  Q  is  positive  semi-definite.  This  distribution  Is  different  from  that  when 
A  is  known  by  the  factor  Q. 


Define 

(B.c 

j)  as  the 

solutions  to  (1.3)  -  (1.4) 

using  X0,  i.e. 

.1 

-  B(X o 

0  * 

o(X0).  Detailed  calculations  based  on  (Al)  show 

that  when  H0: 

Xo 

=  x0 

is 

true. 

H 

N  (o*  - 

-  *  P 

o)/o  0 

(A3) 

N^X*  - 

*0)/a  -*  N(o, Var  =  e). 

(A4) 

where 

e  = 

(E4'2)(Er)_1lim  [N_1Z%2  - 

(N_V*q.x  )2]' 

1 

ft*«  l  i 
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We  are 

now 

in  a  pos- 

ition  to  state 

Theorem 

A. 

When  H0: 

X  =  X0  is  true,  asymptoti 

r cal  1  y  as  N  « 

and 

o  -*■  0 

the 

statistic 

A  /\ 

l+*  s 

(E/KE^r^L  +  D) 

(AS) 

is  distributed  as  chi-square  with  one  degree  of  freedom,  where 

fi  =  ri(X*) 

D  =  2N((o  -  o*)/o*)(N'1Eri^(ri)  -  1) 

A 

Ei/  =  N_^I*V'(r.) 

I  i 

Eip2  =  N"1Z^2(ri). 

When  x  is  given  by  (1.7),  the  term  D  in  (A5)  is  non-zero  and  can  be  of 
considerable  Importance.  When  x  is  given  by  (1.9),  D  =  0  and  we  obtain 
L*  =  L**. 

Of  course  when  p(x)  =  x2/2,  ij>{x)  =  x  and  x(y)  *  y2  -  1  we  have  that 


L  *  L*  *  L**,  the  normal  theory  likelihood  ratio  test.  The  example  shows  that 
the  result  stated  in  the  body  of  the  paper  depends  for  its  validity  on  the 
assumption  that  o  -*•  o. 


Proof  of  Therem  A.  The  proof  Is  based  upon  the  following  Lemma,  which  is 
extremely  messy  to  obtain  but  only  used  Taylor  expansions. 
hernia  A.  As  N  -*• 00  we  have 

L  -  (Dj  +  D2  +  D3  +  D4  +  D5  +  Dg)  ^  o, 

where 

.  Dx  =  2N( (o  -  a*)/o*)(N-1Zri(X*)iKri(X*))  -1) 

2  -1  1  2 

D2  =  N((X*  -  X)/o)2(N_1l(ouiiHei)  +  v.ip'(e.))  -  (N  Iv^.) 

D3  =  2((o*  -  a)/o)Z(ei4»{e1)  -  r- (X*)i^(ri (X*) )) 

D4  =  -  N((o*  -  o)/o)2 

D5  ■  ZHECjtKejJto*  -  o)(o*  -  o)/o2 

D6  *  N(((o  -  o  )/a)2  -  ( (o*  -  o)/o)2)(Ec2^(ci)  +  ZEe^e^) . 
Theorem  A  follows  from  Lemma  A  because  of  (A3)  and  (A4). 
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